An augmented context free grammar
نویسندگان
چکیده
This paper presents an augmented context free grammar which describes important features of the surface structure and the semantics of discourse in a formal way, integrating new as well as previously existing insights into a unified framework. The structures covered include lists, narratives, subordinating and coordinating rhetorical relations, topic chains and interruptions. The paper discusses the problem of parsing discourse, and compares different grammatical formalisms which could be used for describing discourse structure. R e m k o S C H A a n d L i v i a P O L A N Y I B B N L a b o r a t o r i e s 10 M o u l t o n S t ree t C a m b r i d g e , M A 0 2 2 3 8 latter case, a correct constituent analysis of the discourse is necessary to establish the arguments of the rhetorical relations. On the other hand, the rhetorical relations themselves constitute an important structurebuilding component of discourse. 1. Introduct ion Though a wealth of insights on the structure and meaning of discourse ha,,; been gathered by researchers in linguistics, psychology, ethnomethodelogy and artificial intelligence, these insights have not been integrated into formal grammars which display the breadth, depth and precision ef formal treatments of sentential syntax and semantics. In the present paper we make a step towards a formal, integrated description of the surface structure and semantic interpretation of discourse. We introduce a formalism which uses augmented context free rules for specifying discourse grammars, and demonstrate its viability by developing a set of syntactic/semantic rules which covers a number of important discourse phenomena. We discuss the issue of parsing discourse in a semi-deterministic left-to-right fashion, and relate the grammar presented here to the strategies outlined in [24] [25] for building up a structural description of an unfolding discourse. Finally, we compare the formalism used here to some possible alternatives. 2. Discourse Structure and Discourse Semant ics The semantic interpretation of the utterances in a discourse has been shown to depend on the structural relations obtaining among the segments of that discourse. [8] [17] In developing a grammar for the surface structure of discourse, it is our aim to account for the semantically relevant aspects of its structure. Two phenomena of fundamental importance for semantic interpretation depend crucially on discourse structure: context dependence and rhetorical structure. Context Dependence of Utterance Meanings Context dependence of utterance meanings is a pervasive phenomenon in language. When an utterance is analysed in isolation, its meaning is underdetermined in many ways as an effect of the following phenomena: • Indexicallty. One rarely makes a statement or asks a question about the universe at large: every utterance presupposes an implicit temporal, spatial and topical framework that constrains the scopes of the meanings of the constituents of the utterance. Every sentence must therefore be evaluated with respect to a frame of reference which i,,; usually left implicit in the sentence itself. • Anaph¢}ra. Many utterances contain pronouns and definite descriptions which constitute overt references to the previous discourse. An analysis of the structure of the previou:~ discourse is necessary to resolve such anaphoric references correctly. ,,Implicit arguments. Many natural language words are semantically unsaturated, needing externally provided arguments in order to be interpreted felicitously. Nevertheless, such constructions can be used in sentences which do not mention their arguments explicitly, if these arguments can be inherited from previous parts of the discourse. ("John is ta//er [than Peter]." "What is the speed [of John's car]?") (Cf. [41) The Rhetorical Structure of Discourse The insiflht of work focused on the rhetorical structure of discourse is that a speaker engaged in. a discourse may perform speech acts whose illocutionary force has scope over complex propositions which are built up out of individual sentence meanings and the rhetorical relations between them. (See, for example, [13], [19], and [20].) These rhetorical relations may be overtly expressed, or they may have to be abducted on the basis of the sentence meanings. Paradigmatic examples of such complex propositions are ",4 caused B", "A was caused by B", "A provides evklence for B", etc. -where A and B may stand for propositions expressed by individual sentences, or may themselves be complex prepositions expressed by discourse segments. Because of the In the approach to discourse semantics outlined in this paper, every sentence is initially interpreted in a local, context-independent fashion. This results in a meaning representation which will usually contain free variables, standing for the discourse-dependent elements in the utterance meaning. When a sentence is integrated into the ongoing discourse, these variables are bound to values picked up from the context into which the sentence is inserted. The next section describes this in more detail. 3. An A u g m e n t e d Context Free G r a m m a r f o r D i s c o u r s e . Discourses have a hierarchical structure. They are built up recursively out of units of various kinds which can occur as constituents of each other. To account for this, a discourse grammar must be able to assign a tree structure to a discourse. We call this tree structure the discourse parse tree. To describe in a formal way how a discourse parse tree is constructed out of constituent sentences, we use a context free grammar whose non-terminal symbols are augmented with attribute/value pairs. (Distinct non-terminal categories have distinct sets of attributes.) Context free rules describe how the constituent segments of a discourse (which we call discourse constituent units or dcu's) are built up out of their subconstituents. The values of the attributes on a non-terminal represent the relevant structural and semantic properties of the dcu generated by that non-terminal. Every attribute has a fixed set of possible value-expressions. The value-expressions may be of different kinds: they may be atomic, they may themselves be sets of attribute/value pairs, or they may be logical expressions. Value-expressions often store parfia/ information; therefore, they may contain free variables. A value-expression stands for the set of its ground instances. (A value-expression without variables thus stands for a singleton set.) When an attribute has a value-expression which stands for the empty set, the complex category symbol that contains this valueexpression fails to label a possible dcu. The context-free rules enforce agreement and upwards-inheritance of the relevant properties of different constituents through tile fact that different occurrences of the same variable take on identical values. To put this in more precise terms, we define the meaning of an augmented context free rule as follows. If A, B, C, Y, Z stand for complex category symbols (including attribute/value-expression pairs) and Idcul]¥ and [dcu2]z are legitimate dcu's, the rule "A => B C" legitimizes the dcu ~([[dcul]y [dcu2]z]A ) iff the substitution ~ is the most genera/unifier I of the terms and , and G(A) is a legitimate complex category symbol, not containing empty attribute-value expressions. 3.1. Attribute Values and Semantic Interpretation. The propagation of attribute values between dcu's plays an important part in establishing the semantic interpretation of the utterances in a discourse. We now list some of these attributes, and discuss their role in semantic interpretation and discourse processing. The Semantics attribute records a logical formula representing the meaning of the discourse constituent unit that it is associated with. Anaphorlc elements which have not been resolved inside the dcu are represented by free variables. There are two mechanisms for anaphor resolution: (1) unification with value-expressions of attributes of other dcu's, and (2) explicit search processes involving the Discourse Referents Set of accessible dcu's. The Discourse Referents Set records the entities introduced in the discourse unit that it is associated with. These, plus the entities in the Discourse Referents,Sets of the embedding units (dominating nodes in the tree) are the entities which are available for anaphodc reference in an utterance which extends or expands that discourse unit. Every Discourse Referent is a pair consisting of (1) the linguistic expression that introduced it, and (2) the semantic representation that the system attached to that expression. Discourse Referent Sets are accessed in different ways by a number of algorithms, which resolve the meanings of ~Tho concepts we use here ere essentially the ones that were developed for term unilication in first-order logic theorem proving ~30]. We assume some straighfloPaLU~ generalization of these concepts which deals with the fact that our "terms" have a structtji# which, though not interestingly different, is somewhat richer.
منابع مشابه
The Use of Context-Sensitive Grammar For Modeling RNA Pseudoknots
In this study, a context-sensitive grammar is suggested to model various forms of RNA secondary structures, especially pseudoknots. Comparing with a conventional context-free grammar used to model secondary structures of RNA sequences, the use of context-sensitive grammar gives us an advantage of more natural representation of pseudoknots. The suggested grammar directly reflects the appearance ...
متن کاملParsing Long English Sentences with Pattern Rules
In machine translation, parsing of long English sentences still causes some problems, whereas for short sentences a good machine translation system usually can generate readable translations. In this paper a practical method is presented for parsing long English sentences of some patterns. The rules for the patterns are treated separately from the augmented context free grammar, where each cont...
متن کاملA Unifying Framework for Concatenation Based Grammar Formalisms
Linear Context Free Rewriting Systems (LCFRS, [Wei88]) are a general class of trans-context-free grammar systems; it is the largest well-known class of mildly context sensitive grammar; languages recognized by LCFRS strictly include those generated by the HG, TAG, LIG, CCG family. (Parallel) Multiple Context-Free Grammar (PMCFG, [KNSK92]) is a straightforward extension of LCFRS. Literal Movemen...
متن کاملAn Extension of Earley's Algorithm for S-Attributed Grammars
Attribute grammars are an elegant formalization of the augmented context-free grammars characteristic of most current natural language systems. This paper presents an extension of Earley's algorithm to Knuth's attribute grammars, considering the case of S-attributed grammars. For this case, we study the conditions on the underlying base grammar under which the extended algorithm may be guarante...
متن کاملA Systematic Comparison of Phrase-Based, Hierarchical and Syntax-Augmented Statistical MT
Probabilistic synchronous context-free grammar (PSCFG) translation models define weighted transduction rules that represent translation and reordering operations via nonterminal symbols. In this work, we investigate the source of the improvements in translation quality reported when using two PSCFG translation models (hierarchical and syntax-augmented), when extending a state-of-the-art phraseb...
متن کاملAn Efficient Augmented-Context-Free Parsing Algorithm
An efficient parsing algorithm for augmented context-free grammars is introduced, and its application to on-line natural language interfaces discussed. The algorithm is a generalized LR parsing algorithm, which precomputes an LR shift-reduce parsing table (possibly with multiple entries) from a given augmented context-free grammar. Unlike the standard LR parsing algorithm, it can handle arbitra...
متن کامل